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Abstract 


In  barrier  methods  for  constrained  optimization,  the  main  work  lies  in  solv¬ 
ing  large  linear  systems  Kp  =  r,  where  K  is  symmetric  and  indefinite. 

For  linear  programs,  these  KKT  systems  are  usually  reduced  to  smaller 
positive-definite  systems  AH~iATq  =  s,  where  H  is  a  large  principal  submatrix 
of  K.  These  systems  can  be  solved  more  efficiently,  but  AH~lAT  is  typically 
more  ill-conditioned  than  K. 

In  order  to  improve  the  numerical  properties  of  barrier  implementations,  we 
discuss  the  use  of  “reduced  KKT  systems”,  whose  dimension  and  condition  lie 
somewhere  in  between  those  of  K  and  AH~lAT.  The  approach  applies  to  linear 
programs  and  to  positive  semidefinite  quadratic  programs  whose  Hessian  H  is 
at  least  partially  diagonal. 

We  have  implemented  reduced  KKT  systems  in  a  primal-dual  algorithm 
for  linear  programming,  based  on  the  sparse  indefinite  solver  MA27  from  the 
Harwell  Subroutine  Library.  Some  features  of  the  algorithm  are  presented, 
along  with  results  on  the  neilib  LP  test  set. 

Keywords:  linear  programming,  quadratic  programming,  indefinite  systems, 
KKT  systems,  barrier  methods,  interior-point  methods. 
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Reduced  K  KT  Systems 


1.  Introduction 


We  discuss  harrier  methods  for  solving  linear  and  quadratic  programs  expressed  in 
the  standard  form 

minimize  cTx  +  \xTQx 

X  (1-1) 

sub  ject  to  Ax  =  b,  l  <  x  <  u, 


where  A  is  m  x  n  (m  <  n)  and  Q  is  symmetric  positive  semidefinite.  The  problem 
is  a  linear  program  (LP)  when  Q  =  0,  and  a  quadratic  program  (QP)  otherwise. 
We  assume  that  an  optimal  solution  (x*,n*)  exists,  where  7r*  is  a  set  of  Lagrange 
multipliers  for  the  constraints  Ax  =  6. 

Implicit  within  most  of  the  current  barrier  or  interior-point  algorithms  is  a  so- 
called  KKT  system  of  the  form 


(1.2) 


in  which  II  =  Q  -fi  D  where  D  is  positive  semidefinite  and  diagonal.  The  search 
direction  (Ax,  An)  is  used  to  update  the  current  solution  estimate  (x,n).  In  some 
cases  it  is  obtained  by  solving  the  same  KKT  system  with  more  than  one  right-hand 
side.  It  is  critical  that  such  systems  be  solved  quickly  and  reliably. 


1.1.  Reduced  KKT  Systems 

If  II  is  nonsingular  and  diagonal  (as  in  the  LP  case;,  it  is  common  to  use  it  as  a 
block  pivot  and  reduce  (1.2)  to  a  system  involving  AII~1AT.  In  general  this  is  an 
unstable  process  because  f>  usually  contains  some  very  small  diagonals.  The  main 
advantages  are  that  nh  '  is  much  smaller  than  I\  and  it  is  positive-definite. 

Our  aim  is  to  discuss  an  intermediate  strategy  in  which  part  of  II  is  used  as 
a  block  pivot.  The  reduced  systems  obtained  are  considerably  smaller  than  A’, 
and  typically  no  more  than  twice  as  large  as  AH~lAT.  The  approach  retains  the 
numerical  reliability  of  factoring  the  full  matrix  K,  with  an  efficiency  that  is  closer 
to  that  of  using  AH~lAT.  It  also  provides  a  convenient  way  of  dealing  with  free 
variables  and  dense  columns  (i.e.,  variables  with  bounds  -oo  <  x3  <  oo  and  columns 
of  A  that  have  many  nonzeros). 

In  the  QP  case  when  Q  is  at  least  partly  diagonal,  reduced  KKT  systems  can 
still  be  formed  efficiently. 

The  proposed  use  of  reduced  KKT  systems  is  motivated  by  the  sensitivity  anal¬ 
ysis  in  [Pon90]  and  by  the  investigation  of  preconditioners  for  KKT  systems  in 
(GMPS90).  Note  from  both  references  that  large  diagonals  in  II  give  I\  a  decep¬ 
tively  high  condition  number,  but  they  do  not  cause  sensitivity  in  the  solution  of 
systems  involving  K.  (Similarly,  a  system  Dx  =  b  with  D  =  diag(1020,  lO10, 1,1,1) 
has  a  well-defined  solution,  even  though  cond (D)  =  1020.) 

In  fact,  large  diagonals  of  II  are  desirable,  since  they  arc  obvious  candidates  for 
a  block  pivot.  For  example,  if  ||/t||  «  1,  any  diagonals  II]3  significantly  larger  than 
one  could  be  included  in  the  block  pivot..  The  associated  reduced  system  reflects  the 
true  sensitivity  of  linear  systems  involving  1\. 


2.  A  Primal-Dual  QP  Algorithm 
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2.  A  Primal-Dual  QP  Algorithm 

For  concreteness  we  describe  the  main  parts  of  a  primal-dual  barrier  algorithm  for 
LP  and  QP.1  After  deriving  the  KKT  systems  to  be  solved,  we  are  able  to  discuss 
certain  numerical  issues. 

In  order  to  handle  upper  and  lower  bounds  symmetrically,  we  slightly  generalize 
the  approach  of  Lustig  et  al.  [LMS89,LMS90]  and  restate  problem  (1.1)  as  follows: 

minimize  cTx  +  \xTQx  +  •i|[Ta;![ 

subject  to  Ax  -\-6p  = 

X  —  Si  = 

X  +52  = 

with  s: ,  52  >  0.  The  scalars  7  and  6  are  intended  to  be  “small”.  The  dual  variables 
associated  with  the  three  sets  of  equality  constraints  will  be  denoted  by  7r,  2  and 
-y.  At  a  solution,  z  and  y  are  non-negative. 

We  assume  that  l  <  u,  since  fixed  variables  ( lj  =  Uj)  can  be  absorbed  into  b.  If 
lj  =  -00  or  Uj  =  +00,  we  omit  the  corresponding  equation  in  x-sj  =  l  or  X+S2  =  u. 
(In  particular,  both  equations  are  omitted  for  free  variables.)  Symmetric  treatment 
of  the  bounds  via  si  and  S2  allows  a  problem  to  be  treated  “as  it  stands”,  without 

converting  the  bounds  to  0  <  x  <  u  (say).  The  latter  practice  is  hazardous  in  the 

case  of  “almost  free  variables”,  whose  bounds  are  not  large  enough  to  be  treated  as 
infinite  (e.g.,  -106  <  Xj  <  106). 


+  1 
b, 

u , 


(2.1) 


2.1.  The  Barrier  Subproblem 

For  some  scalar  n  >  0,  the  associated  barrier  subproblem  is  to  minimize 

cTx  +  \xtQx  +  +  ^IIpII2  -  p£ln(s i)j  -  /i^ln(52)j 

i  j 


subject  to  the  same  equality  constraints.  Optimality  conditions  for  this  subproblem 
include  the  equation  p  =  Sx.  We  can  therefore  eliminate  p  immediately.  The 
remaining  optimality  conditions  may  be  stated  as  the  following  equations: 


1  b  -  Ax  -  62 x  ^ 

(  T  \ 

l  -  X  +  5l 

h 

U  -  X  -  52 

*2 

c  +  Qx  +  72x  -  Atx  -  z  +  y 

f 

pe  -  SiZe 

V\ 

\  fie-SiYe  j 

\ v*  / 

(2-2) 

where  e  is  a  column  of  ones  and  Si,  S 2,  Z,  Y  are  diagonal  matrices  composed  from 
s2<  zi  V •  Primal-dual  methods  maintain  positive  approximations  to  all  of  the 
latter  vectors. 


1Related  references  for  LP  are  [MegSS.KMYSS.MMSSOjLMSSD.LMSOO.MehSO.MchDO.MehOl]. 
Some  dealing  explicitly  with  QP  are  [MARSS.MASD.AHRTDtLCLMSflO.JSSOO.PonOCLVC!)!]. 
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2.2.  The  Newton  Direction 

The  Newton  equations  for  generating  a  search  direction  {Ax,  Asi,As2,  At,  Az .  Ay) 
are 


AAx 

+  67At 

Ax 

-  Asi 

Ax 

+  As2 

(Q  +  7 1)Ax 

+  ATAir 

+  Az 

-  Ay 

ZAs ! 

-f-  S\Az 

YAs2 

+  S2Ay 

r 

ti 

h 

t 


(2.3) 


v\ 

V2, 


which  may  be  solved  by  defining 

H0  =  Q  +  S?Z  +  SZlY, 
w  =  Si'iZti  +vi)  +  St1(Yt2-v2)~t, 

solving  the  KKT-like  system 


K  = 


Ho  +  7 2 1  AT  \ 

A  -62I  )  ’ 


and  finally  solving  the  equations 


(2.4) 


(2.5) 


=  Ax-ti, 
As2  —  t2  —  Ax, 
S\Az  =  vi  -  ZAsi, 
S2Ay  =  t>2  -  y^s2- 


(2.6) 


It  is  straightforward  to  show  that  any  values  of  x  and  t  give  the  same  search 
direction  ( As\,As2,Az,Ay ). 


2.3.  Regularization 

The  above  definition  of  K  shows  its  dependence  on  7  and  S.  Later  we  shall  not  need 
Ho  itself,  but  will  work  with  H  =  H 0  +  7 2/. 

The  perturbations  involving  7  and  6  are  included  to  “regularize”  the  problem. 
The  ‘-.c-im  |||7x||2  ensures  that  ||z*||  is  bounded,  and  the  term  6p  allows  Ax  =  b  to 
be  satisfied  in  some  least- squares  sense  if  the  original  constraints  have  no  feasible 
sola  lion.  An  important  property  is  that  both  perturbations  help  preserve  the  non¬ 
singularity  of  K.  For  example,  if  A  does  not  have  full  row  rank,  K  is  singular  unless 
we  choose  6  >  0.  Similarly,  suppose  some  columns  of  A  associated  with  free  vari¬ 
ables  are  linearly  dependent;  if  the  corresponding  columns  of  IIo  are  also  dependent 
(e.g.  if  the  problem  is  an  LP),  K  is  singular  unless  we  choose  7  >  0.  (An  alternative 
means  for  preserving  nonsingularity  with  free  variables  is  given  in  [GMPS91].) 

Since  p  =  6t  at  a  solution,  the  regularization  terms  in  the  objective  function  are 
effectively  £||7a:||2  -f  ^li^H2,  which  has  a  satisfying  symmetry  and  shows  that  both 
x*  and  7r*  are  bounded  if  7  and  8  are  nonzero. 


2.  A  Primal-Dual  QP  Algorithm 
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Objective  perturbations  of  the  form  ^  |j-yx|j2  have  been  studied  by  Mangasarian  et 
at.  [MM79,Man84],  who  show  that  the  LP  solution  is  not  perturbed  if  7  is  sufficiently 
small.  The  approach  has  been  successfully  pursued  in  the  interior-point  context  by 
Setiono  [Set90b]. 

An  alternative  form  of  regularization  is  the  proximal-point  method  of  Ttockafellar 
[Roc76],  which  involves  an  objective  term  of  the  form  ^||7(a:  -  *fc)||2  and  does  not 
perturb  the  problem  as  x*  — ►  x*  even  if  7  is  not  particularly  small.  Again,  this 
approach  has  been  successfully  explored  by  Setiono  [Set89,Set90a,Set90c]. 

2.4.  A  Predictor-Corrector  Approach 

A  number  of  authors  (e.g.  [MAR88,KLSW89,JSS90])  have  suggested  alternatives 
to  the  Newton  direction.  Most  of  these  suggestions  may  be  traced  to  the  idea  of 
“extrapolation”  first  described  by  Fiacco  and  McCormick  [FM68].  The  algorithm 
we  have  implemented  is  similar  to  that  suggested  by  Mehotra  [Meh89,Meh90].  Im¬ 
plementations  based  on  his  suggestion  have  been  shown  to  be  efficient  in  practice 
(see  [Meh89,Meh90,LMS90]). 

The  approach  requires  two  solves  of  the  Newton  system  (2.3)  with  different 
vectors  «i  and  V2  in  the  right-hand  side.  A  predictor  step  (Ax,  As\,  As?,  Arc,  Az,  Ay) 
is  obtained  by  solving  with  v\  =  -S\Ze  and  V2  =  -S^Ye  (i.e.  p  =  0  in  (2.2)).  A 
corrector  direction  is  then  computed  using  v\  =  pe  -  S\Ze  -  AS\AZe  and  i>2  = 
pe  -  S2Ye  -  ASiAY e. 

If  the  predictor  step  is  “large”  it  seems  possible  that  a  poor  corrector  direction 
will  result.  Therefore  as  a  precaution  we  sometimes  scale  the  predictor  step  down 
before  constructing  the  second  t>i  and  v^.  Let  ax  and  az  be  maximum  steps  along 
the  predictor  direction  that  keep  (si,S2)  and  ( z,y )  non-negative,  and  let  4>  =  0.1. 

If  ax  <  <f> ,  we  consider  that  the  predictor  steps  As\  and  As?  are  excessively  large 
and  scale  them  down  by  the  factor  r( 2  -  t),  where  r  =  ax/4>- 

Similarly  if  az  <  4>,  we  scale  Az  and  Ay  down  by  the  factor  r( 2  -  r),  where 
t  =  dcz/4>.  (Note  that  0  <  r( 2  -  r)  <  1  in  both  cases.) 

We  have  experimented  with  a  few  other  values  of  <fi  but  not  observed  any  pro¬ 
found  effect.  The  value  4>  -  0  corresponds  to  accepting  the  predictor  step  as  it 
stands,  and  <j>  =  1  would  “interfere”  rather  too  often.  The  chosen  value  0.1  gave 
slightly  better  performance  than  0,  as  measured  by  the  number  of  times  that  the 
corrector  direction  was  accepted. 

It  can  be  demonstrated  that  the  corrector  direction  is  not  necessarily  a  descent 
direction  for  ||/^||2.  We  are  therefore  prepared  to  fall  back  on  the  Newton  direction 
as  described  below.  Different  but  analogous  precautions  are  taken  by  Mehrotra 
[Meh90]. 

2.5.  Steplengths 

Given  a  positive  vector  z  and  a  search  direction  Az,  we  need  a  trial  steplcngth  a- 
that  keeps  z  +  <xzAz  positive.  It  is  common  practice  to  define  such  a  steplcngth  as 

az  =  o  min  zJ\Az2\, 

Az}<  0  Jl 
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where  a  €  (0, 1).  Such  a  steplength  might  be  written  as  the  function  ax  =  a(z), 
with  Az  and  a  known  from  the  context.  In  our  case  we  need  two  such  steplengths, 

ax  =  a(si,s2)  and  az  -  a (z,y), 

where  we  mean  that  ax  keeps  both  si  and  s2  positive,  and  az  keeps  both  z  and 
y  positive.  Following  [LMS90]  we  used  a  —  0.99995  for  the  numerical  results  of 
Section  7. 

Given  a  descent  direction,  the  usual  procedure  in  a  descent  method  is  to  take  a 
step  along  the  direction  that  reduces  some  merit  function.  In  our  case  it  is  not  clear 
that  a  suitable  reduction  in  \\f^ |j2  can  be  obtained  using  the  same  step  a(-si,s2,z,i/) 
in  all  the  variables.  It  has  been  observed  in  practice  [LMS89,LMS90]  that  taking 
different  steps  in  the  primal  and  dual  variables  leads  to  fewer  iterations  when  solving 
linear  programs.  One  reason  for  this  is  clear.  Suppose  we  are  solving  an  LP  with 
6  =  7  =  0.  After  a  step  ax  in  the  primal  variables  and  az  in  the  dual  variables  we 
have 

r  ♦—  (1  -  ax)r  and  t  *- (1  -  at)t. 

(We  assume  r  ^  0,  t  /  0,  ax  <  1  and  az  <  1.)  If  we  take  the  same  step  in  the  primal 
and  dual  variables  and  wish  to  remain  feasible  to  the  same  extent,  the  required  step 
is 

=  min(ar,a,), 

and  the  total  reduction  in  |jr||2  +  ||f!]2  will  not  be  as  great.  After  20  or  30  iterations 
the  reduction  attained  by  the  two  strategies  may  be  significantly  different. 


2.6.  The  Linesearch 

Gill  et  al.  [GMPS91,  Section  7.1]  analyze  a  primal-dual  barrier  algorithm  for  linear 
programming  based  on  the  Newton  direction  and  separate  steps  in  the  primal  and 
dual  variables.  They  show  that  the  component  of  the  Newton  direction  in  the  primal 
variables  is  a  descent  direction  for  a  merit  function  based  on  the  primal  barrier 
function.2  It  is  proved  prove  that  the  iterates  converge  to  a  solution  provided  the 
step  taken  in  the  primal  variables  reduces  this  merit  function.  (Such  a  step  always 
exists.)  Moreover,  from  the  nature  of  the  merit  function  and  the  fact  that  the 
Newton  direction  is  a  direction  of  sufficient  descent,  it  is  clear  that  almost  any 
nonzero  step  in  the  primal  variables  will  suffice.  The  step  in  the  dual  variables  is 
almost  arbitrary.  The  only  requirement  is  that  it  be  bounded  and  that  z  and  y  be 
kept  nonzero. 

The  merit,  function  in  our  implementation  is  different  from  that  advocated  in 
[GMPS91]  but  is  similar  in  spirit.  We  require  that  ||/J|2  be  reduced.  We  anticipate 
that  a  reduction  in  this  function  almost  always  implies  a  reduction  in  M(x,si,s2,/j). 


2For  LP  problems  of  the  form  (2.1)  without  regularization,  the  merit  function  would  be 
M(x,si,sj,p)  =  eTz-/*£jln(*i)j  - /»£,  ln(s2);  4-  p(j|i>  -  Ax\\i  +  ||/-r  +  si||i  +  ||«  -  x  -  s2||i)- 
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Briefly,  if  a  trial  step  along  the  corrector  direction  reduces  ||/M||2,  the  d°D  is 
taken  and  the  iteration  is  complete.  Otherwise,  the  Newton  direction  is  computed 
and  used  to  reduce  ||/M||2  (perhaps  with  the  aid  of  a  back- tracking  linesearch). 
Convergence  is  assured  if  K  remains  sufficiently  nonsingular. 

More  precisely: 

1.  Separate  steplengths  a*  =  a(si,S2)  and  a,  =  a(z,y)  are  computed  for  the 
corrector  direction.  If  these  reduce  the  merit  function  jj/M||2  by  a  sufficient 
amount,  they  are  accepted  and  the  ite-ation  is  complete. 

2.  Otherwise,  the  Newton  direction  (2.3) !"  computed  along  with  new  steplengths 
ax  and  ax.  If  the  merit  function  is  sufficiently  reduced,  the  steps  are  taken 
and  the  iteration  is  complete. 

3.  Otherwise,  the  steplengths  are  made  equal  (to  the  smaller  of  the  two)  and 
the  merit  function  is  tested  again.  If  necessary,  the  steplengths  are  repeatedly 
halved  until  the  merit  function  is  suitably  reduced. 

4.  If  up  to  5  halvings  of  the  steplengths  fail  to  reduce  the  merit  function,  we 
assume  that  the  search  direction  has  insufficient  accuracy.  The  linesearch 
terminates  with  a  request  for  stricter  tolerances  in  factorizing  the  KKT  system 
(see  Section  4).  In  practice,  this  is  perhaps  the  most  important  function  of 
the  linesearch. 

2.7.  Reducing  the  Barrier  Parameter  n 

The  initial  and  minimum  values  of  /x  are  to  some  extent  user-specifed;  see  Section  6. 
At  intervening  iterations,  /x  is  reduced  to  (1  -  a^)/x,  where  =  min(ar, ax,a). 
Thus,  n  decreases  monotonically  and  more  rapidly  if  larger  steps  are  taken. 

This  choice  of  /x  is  simple  and  does  not  depend  on  the  duality  gap  (which  is 
difficult  to  define  for  infeasible  points).  It  appears  to  be  satisfactory  in  practice. 
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3.  Reduced  KKT  systtn.s 

If  H  =  Ho  +  72I  is  nonsingular,  ‘s:'.5)  may  be  solved  using  the  range-space  equations 
of  optimization: 

(AH~1Ar+  6*I)A tt  =  r  -  AH~lw,  II  Ax  =  ATAn  +  to.  (3.1) 

The  main  benefit  is  that  AH~lAT  f  6*1  is  much  smaller  than  K  and  is  positive 
definite,  so  that  well-established  sparse  Cholesky  factorizations  can  be  applied.  Some 
drawbacks  are  as  follows: 

1.  AH~XAT  +  6*1  is  normally  more  ill-conditioned  than  K. 

2.  Free  variables  complicate  direct  use  of  (3.1)  by  making  Ho  singular. 

3.  Relatively  dense  columns  in  A  degrade  the  sparsity  of  AH~lAT  +  6*1  and  its 
factors. 


In  general,  small  diagonals  of  H  prevent  it  from  being  a  “good"  block  pivot  from 
a  numerical  point  of  view.  That  is,  if  Gaussian  elimination  were  applied  to  K,  not 
all  diagonals  of  H  would  be  acceptable  as  pivots.  To  overcome  this  difficulty,  we 
note  that  in  practice  most  of  H  is  likely  to  be  acceptable  as  a  block  pivot.  Thus  we 
partition  H  and  A  as 


H  = 


A  =  (  N  B), 


where  the  diagonals  of  HN  are  a  “reasonable”  size  compared  to  the  nonzeros  in  the 
corresponding  columns  of  N.  In  general,  a  reordering  of  the  variables  is  implied. 
The  column  dimension  of  N  may  be  anywhere  from  0  to  n  (and  similarly  for  B). 
The  KKT  system  (2.5)  becomes 


/ 

Hn 

Nt  \ 

1  Axn  n 

(  V>H  \ 

hb 

bt 

Axb 

- 

Wb  \ 

\ 

N 

B 

-6*1  j 

{-A;  , 

V  r  ) 

which  may  be  solved  via  the  reduced  IiKT  system 

r  {  Axg  \  (  wB  \  (  II B  Bt  \ 

"{-A*  )~{r-NII;'wN  )'  D~\b  -NII~lNr-6*I  )' 

(3.3) 

The  final  step  is  to  solve  the  typically  diagonal  system  HnAxn  =  wN  +  NrA-n. 

If  IIN  happens  to  be  all  of  II,  KB  =  -AII~1AT  -  6*1  and  the  reduced  KKT 
system  is  equivalent  to  (3.1).  Otherwise,  Kn  is  a  symmetric  indefinite  matrix.  Like 
K  it  can  be  processed  by  a  sparse  indefinite  solver  such  as  MA27  [DR82,DRS3] 
(an  implementation  of  the  factorization  described  in  [BP71.BK77]).  Tire  above 
difficulties  are  resolved  as  follows: 


S.  Reduced  KKT  systems 
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1.  Kb  need  not  be  more  ill-conditioned  than  K. 

2.  Free  variables  are  always  included  in  B. 

3.  Dense  columns  of  A  are  also  included  in  B.  (They  do  not  necessarily  degrade 
the  sparsity  of  the  I(B  factors.) 

As  with  current  Cholesky  solvers,  MA27  has  an  Analyze  phase  (to  choose  a  row 
and  column  ordering  for  KB  and  to  set  up  a  data  structure  for  its  factors)  and  a 
Factor  phase  (to  compute  the  factors  themselves).  Some  disadvantages  of  working 
with  reduced  KKT  systems  are: 

1.  Whenever  the  N-B  partition  is  altered,  a  new  Analyze  is  required.  This  is 
usually  less  expensive  than  the  Factor  phase,  often  by  an  order  of  magnitude, 
and  we  expect  it  to  be  needed  only  sometimes  (typically  the  last  few  iterations). 
However,  it  can  be  costly  for  certain  structures  in  A. 

2.  To  date,  sparse  indefinite  factorization  is  more  expensive  than  Cholesky  fac¬ 
torization  when  there  is  a  large  degree  of  indefiniteness  (as  in  KKT  systems). 

3.1.  Related  Work 

To  keep  II  nonsingular  in  the  presence  of  free  variables,  some  authors  have  treated 
each  such  variable  as  the  difference  between  two  nonnegative  variables..  Lustig, 
et  al.  [LMS89,LMS90]  report  satisfactory  performance  on  the  netlib  LP  test  set, 
which  contains  a  few  relevant  examples.  To  deal  with  problems  involving  many  free 
variables,  others  authors  have  introduced  movirg  artificial  bounds  (o.g.  (Maix89) 
and  Vanderbei  [Van90a],  who  cites  difficulties  with  the  first  approach). 

Dense  columns  have  been  handled  by  using  Cholesky  factors  of  the  sparse  part 
of  AH~1At to  precondition  the  conjugate-gradient  method  (e.g.  [GMSTW86])  or  to 
form  a  certain  Schur  complement  |MarxS9,LMS89].  A  difikul.y  is  that  the  “sparse” 
Cholesky  factor  usually  becomes  even  more  ill-conditioned  than  the  one  associated 
with  all  of  AII~1At.  More  recently,  the  approach  of  splitting  or  stretching  dense 
columns  has  been  proposed  and  implemented  with  success  [LMC89,Van90b],  The 
increased  problem  size  is  perhaps  an  inconvenience  if  not  a  difficulty.  (See  also  Grcar 
[Grc90],  who  recommends  the  term  stretching  and  gives  an  extremely  thorough 
development  and  analysis  of  this  new  approach  to  solving  sparse  linear  equations,) 
Further  recent  work  on  solving  large  indefinite  systems  (to  avoid  the  difficulties 
associated  with  AH~1AT)  appears  in  [Tur90,Meh91,FM91,VC91],  Iterative  solution 
of  the  full  KKT  system  is  explored  in  [GMPS90]. 

3.2.  Quadratic  Programs 

In  general,  the  Hessian  of  a  convex  quadratic  objective  will  have  the  form 
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where  D  is  diagonal  and  positive  definite,  and  the  full  Q  is  positive  semidefinite. 
The  dimensions  of  each  partition  depend  on  the  application  and  could  be  anything 
from  0  to  n. 

The  associated  matrix  H  =  Q  +  S^Z  +  S^Y  has  the  structure 


(  H 


H  = 


\ 


D 

E 


(3.5) 


where  H  and  D  are  diagonal.  Reduced  KKT  systems  can  be  formed  as  before,  using 
suitably  large  diagonals  in  diag(tf ,  .D)  as  a  block  pivot.  Variables  associated  with 
Q  are  most  easily  dealt  with  by  keeping  them  in  the  aBn  partition  of  K  (alongside 
those  associated  with  dense  columns  of  A). 

A  diagonal  of  diag {H,D)  may  be  judged  “suitably  large”  by  comparison  with 
the  corresponding  column  of  A.  (Since  Q  and  H  are  semidefinite,  there  is  no  need 
to  compare  a  diagonal  of  D  with  the  corresponding  columns  of  E.) 


3.3.  Separable  QP 

To  match  the  OBI  implementation  based  on  AH~XAT  [LMS90],  Carpenter  et  al. 
[CLMS90]  have  emphasized  the  case  where  Q  and  E  are  null  and  Q  is  purely  di¬ 
agonal.  Any  convex  QP  can  be  transformed  into  this  case  by  using  the  Cholesky 
factorization  PQPT  —  LLT ,  where  P  is  a  permutation  matrix  and  L  is  lower  trian¬ 
gular  (and  possibly  trapezoidal).  Introducing  linear  constraints  of  the  form 

Xi  =  LtPxq 

leads  to  a  larger  problem  of  the  required  form. 

Such  an  approach  may  often  be  satisfactory,  especially  if  the  dimension  of  Q  is 
relatively  low.  However,  some  implementations  based  on  AIl~xAT  would  have  diffi¬ 
culty  with  the  additional  free  variables  xi.  Even  if  these  are  dealt  with  “correctly” 
via  the  KKT  system,  the  factors  of  the  reduced  KKT  systems  (which  now  involve 
L)  are  likely  to  be  more  dense  than  in  the  preceding  direct  approach. 

Further  discussion  of  this  subject,  along  with  numerical  comparisons,  appears  in 
[VC91]. 

In  summary,  the  use  of  reduced  KKT  systems  for  structured  sparse  QP  Hessians 
(3.4)  provides  greater  flexibility,  and  hence  greater  efficiency  in  at  least  some  cases, 
than  use  of  the  full  KKT  system  [Pon90l  or  the  fully  reduced  system  AII~XA 7 
[CLMS90]. 


4.  Stability  and  Sparsity  Tolerances 
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4.  Stability  and  Sparsity  Tolerances 

Let  S  =  -(N  H^NT  +  62I)  be  the  Schur  complement  appearing  in  the  south-east 
corner  of  KB  (3.3).  (We  assume  HN  is  diagonal.)  In  our  implementation,  three 
parameters  are  used  to  control  the  choice  of  N  and  the  factorization  of  KB ,  with 
typical  values  as  follows: 

Htol  =  10-6, 
ndense  =  10, 
factol  =  0.01. 

Note  that  small  diagonals  of  IIN  lead  to  large  entries  in  S,  while  “dense”  columns 
in  N  lead  to  excessive  density  in  S.  We  therefore  use  Htol  and  ndense  to  control 
the  partitioning  of  K  in  the  following  way.  The  j- th  column  of  //  is  included  in  HN 
(and  column  aj  of  A  is  included  in  N)  if 

1.  Hjj  >  Htol\\aj\\,  and 

2.  ctj  has  fewer  than  ndense  nonzeros. 

Since  we  sca’e  A  to  give  ||a,j|  «  1  for  all  j ,  we  avoid  storing  an  72-vector  of  norms 
by  simplifying  the  first  test  to  just  Hjj  >  IIiol.  Including  ||aj||  would  give  slightly 
greater  reliability. 

The  third  parameter  factol  is  used  as  the  stability  tolerance  u  [DIt82,DR83] 
when  the  Factor  phase  of  MA27  is  applied  to  I\D.  In  the  extreme  case  ( N  void, 
KB  -  -AH-'At),  factol  is  inoperative  since  MA27  is  then  performing  Cholesky 
factorization  on  a  (negative)  definite  matrix.  In  all  other  cases,  factol  affects  the 
stability  of  the  numerical  factorization  and  the  fill-in  in  the  factors  (beyond  that 
predicted  by  the  MA27  Analyze). 

4.1.  Iterative  Refinement;  Tightening  Tolerances 
Whenever  a  KKT  system  of  the  form 


is  solved  (via  a  reduced  KKT  system),  we  estimate  whether  the  tolerances  Htol  and 
factol  are  too  lax  by  computing  the  residuals  for  the  full  system: 

( ’■  U  (  “  )  -  a- (  a*W»-/r4*  +  ,U) 

\  92  /  V  r  /  \-Zl7r  J  \  r  -  AAx  -  6Ax  J 

Let  the  relative  error  in  the  residua!  be  error  =  (jjr/ijl  T  ||<72u)/(iMI  I  il?'||)- 

If  error  >  0.01,  we  request  that  Htol  be  increased  by  a  factor  of  100  and  factol  be 
increased  by  a  factor  of  10,  up  to  limits  of  0.1  and  0.2  respectively.  The  partitioning 
of  K  and  the  factorization  of  the  reduced  KKT  system  are  then  repeated.  If  the 
tolerances  are  already  at  their  limiting  values,  the  program  terminates  with  an  error 
condition. 
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Otherwise,  if  error  >  10“4,  we  perforin  one  step  of  iterative  refinement: 

•(-;)■(:)■  (:)-(;)*(:)■ 

If  the  new  relative  error  satisfies  error  >  10“4,  we  request  increased  tolerances  as 
in  the  previous  paragraph. 

If  error  <  10-4  either  before  or  after  refinement,  the  solve  is  taken  to  be  suffi¬ 
ciently  accurate. 

The  predictor- corrector  algorithm  uses  more  than  one  solve  to  obtain  a  search 
direction.  Forming  the  products  AAx  and  ATAn  in  (4.1)  is  moderately  expensive, 
but  at  least  for  the  last  solve,  these  vectors  can  be  saved  and  used  to  update  the 
residuals  r  and  t  during  the  linesearch: 

r  *-  r  -  ax(AAx)  -  azS2Airt 
t  t  +  ax(Q  +  7 2I)Ax  -  az(ATATr  +  Ay  -  Az). 

Direct  computation  of  r  =  6  -  Ax  -  62i r  and  t  =  c  +  Qx  +  j2x  -  Atk  -  z  +  y  can 
then  be  carried  out  less  often — say  every  10  iterations,  or  whenever  ||r||  or  j|l||  drops 
significantly. 

Note  that  it  is  most  effective  to  use  iterative  refinement  on  the  full  KKT  system 
as  described,  not  on  the  reduced  system.  In  particular,  even  implementations  bused 
on  AH~'At  should  compute  corrections  for  both  Ax  and  Air. 


5.  Numerical  Examples 
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5.  Numerical  Examples 

To  illustrate  some  numerical  values  arising  in  the  reduced  KKT  systems  of  Section 
3,  we  apply  the  basic  primal-dual  algorithm  to  two  LP  problems  of  the  form 

min  c3®  subject  to  Ax  =  6,  x  >  0, 


with 


(  1  1  3  3 
\  1  2  1  2 


)• 


using  MATLAB™  [MLB87]  with  about  16  digits  of  precision. 


5.1.  A  Non-degenerate  LP 

We  first  let  the  right-hand  side  and  optimal  solution  be 

*=(!)•  **=(°  °  >  01 

At  the  start  of  the  fifth  iteration  of  the  primal-dual  algorithm,  we  have 
x-  (3.9e— 6,  3.2e-6,  1.000005,  0.999992),  2  =  (0.70,  0.70,  2.0e-6,  1.7e-6),  and 

(  1.8e+5  1 

2.2e+5  1 

r.  2.0e-6  3 

1.7e-6  3 

113  3 

<1  2  1  2 

where  H  =  X~lZ.  Recall  that  Htol  defines  which  diagonals  of  II  are  considered 
large  enough  to  form  a  block  pivot.  In  terms  of  conventional  error  analysis  for 
Gaussian  elimination,  Htol  =  1  or  0.1  should  be  “safe”,  while  Htol  <  10-3  (say) 
is  likely  to  be  unreliable.  Various  values  of  Iltol  give  the  following  reduced  KKT 
systems: 


Htol 


Kb 


cond(A'fl) 


0.1 


1.8e-6 


(  2.0e-6  3 

1.7e-6  3 

3  3  — le-5 

\  1  2  -le-5 

/  1  7e-6  3 

3  -4e+6 

^  2  -le+6 


1  \ 

2 

-le-5 
— 2e-5  / 

2  > 
-le+6 
— 5e+5  , 


(  -le+7  -5e+6 
^  -5e+6  -3e+6 


7.5 


5e+6 


le-20 


58 


H 
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The  large  diagonals  of  H  make  K  seem  rather  ill-conditioned  (cond(A')  =  10s), 
but  pivoting  on  those  diagonals  ( Htol  =  0.1)  gives  a  very  favorable  reduced  system: 
cond(A'a)  =  7.5.  Allowing  one  small  pivot  ( Htol  =  1.8e-6)  gives  the  expected  large 
numbers  and  high  condition:  cond (Kb)  ~  106.  A  second  small  pivot  would  normally 
have  a  similar  effect,  but  here  we  have  m  =  2.  The  large  numbers  arising  from  m 
small  pivots  happen  to  form  a  very  toe//-conditioned  reduced  system:  cond(A'B)  = 
cond(Aff-1AT)  =  58. 

In  general,  the  structure  of  K  is  such  that  pivoting  on  any  nonzero  diagonals  of 
H  should  be  safe  if  the  following  conditions  hold: 

•  There  are  m  or  more  small  pivots  of  similar  size  (to  within  one  or  two  orders 
of  magnitude). 

•  The  associated  m  or  more  columns  of  A  form  a  well-conditioned  matrix. 

Unfortunately,  in  the  presence  of  primal  degeneracy  there  will  be  less  than  m  small 
pivots,  as  the  next  example  shows. 

5.2.  A  Degenerate  LP 

Now  let  the  right-hand  side  and  optimal  solution  be 

‘-(a)-  *,  =  (°  0  0  l)r 

At  the  start  of  the  sixth  iteration  of  the  primal-dual  algorithm,  we  have 
x  =  (3.6e-7,  6.0e-7,  9.3e-7,  0.9999987),  z  =  (0.61,  0.32,  0.29,  2.2e-7),  and 

1.7e+6  1 

5.3e+5  1 

3.1e+5  3 

2.2e-7  3 

113  3 

12  12 

Two  representative  values  of  Htol  give  the  following  reduced  KKT  systems: 


Htol 

A'b 

cond(  Kn) 

f  2.2e-7 

3 

2  \ 

0.1 

3 

-3e-5 

-le-5 

8e+5 

l  2 

-le-5 

-le-5  , 

le-20 

( 

-4e+7 

— 3e+7 

-3e+7  ^ 
— 2e+7  y 

|  le+13 

We  see  that  the  partially  reduced  system  has  a  considerably  lower  condition  than 
the  fully  reduced  system.  After  one  further  iteration,  the  contrast  is  even  greater 
(see  the  second  table  below). 
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5.3.  Condition  Numbers  at  Each  Iteration 

To  further  illustrate  the  effect  of  small  H  pivots,  we  list  the  condition  of  the  reduced 
KKT  systems  arising  at  each  iteration  of  the  primal-dual  algorithm  with  various 
values  of  Htol.  The  barrier  parameter  n  does  not  appear  in  K,  but  it  is  listed  for 
reference. 

For  the  first  (non-degenerate)  problem,  the  following  condition  numbers  cond(/tB) 
were  obtained: 


Itn 

Htol.  0.1 

le-3 

le-4 

1.8e-6 

le-20 

1 

3e-2 

14 

14 

14 

14 

14 

2 

2e-3 

72 

300 

300 

300 

300 

3 

2e-4 

8 

59 

59 

59 

59 

4 

2e-6 

7 

7 

5e+4 

74 

74 

5 

2e-8 

7 

7 

7 

5e+6 

59 

6 

2e-10 

7 

7 

7 

7 

7 

7 

2e-12 

7 

7 

7 

7 

7 

We  see  that  allowing  pivots  as  small  as  Hjj  =  10_p  is  likely  to  give  cond(A'B)  =  10p 
at  some  stage,  except  in  the  fortuitous  case  where  are  there  are  m  or  more  small 
pivots. 

For  the  degenerate  problem,  the  following  values  of  cond(/vB)  were  obtained: 


Itn 

Htol  0.1 

le-4 

le-20 

1 

2e-2 

14 

14 

14 

2 

2e-4 

23 

23 

23 

3 

6e-5 

11 

le+3 

le+3 

4 

2e-5 

114 

114 

7e+6 

5 

2e-7 

8e+3 

8e+3 

le+9 

6 

2e-9 

8e+5 

8e+5 

le+13 

7 

2e-ll 

8e+7 

8e+7 

2e+16 

8 

2e-13 

8e+9 

8e+9 

oo 

We  see  that  very  small  pivots  allow  the  condition  of  I(B  to  deteriorate  seriously. 
We  cannot  expect  a  method  based  on  AH~1AT  to  make  meaningful  progress  on  this 
example  beyond  the  sixth  iteration.  Since  degeneracy  is  a  feature  of  most  real-life 
problems,  it  seems  clear  that  small  H  pivots  must  be  avoided  if  stability  is  to  be 
assured. 

To  date,  implementations  based  on  AH~lAT  [LMS90]  or  some  other  “unstable” 
factorization  (VC91]  appear  capable  of  attaining  8  digits  of  precision  on  most  real- 
life  applications,  but  occasionally  attain  only  6  digits  or  less.  This  is  commendable 
performance,  since  6  digits  is  undoubtedly  adequate  for  most  practitioners.  It  is  the 
“occasionally  less”  that  wc  maintain  some  concern  about! 
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6.  Implementation  Details 

The  remaining  discussion  concerns  our  present  implementation  as  it  applies  to  LP 
problems  ( Q  =  0).  When  7  or  6  is  nonzero  in  (2.1)  the  problem  being  solved  is 
in  fact  a  QP.  A  diagonal  Q  could  easily  be  incorporated,  as  in  [CLMS90].  The 
implementation  is  called  PDQ1  (Primal-Dual  QP  code,  version  1).  We  intend  to 
allow  a  more  general  sparse  Q  in  the  future. 

Various  run-time  parameters  are  used  in  PDQ1  to  define  starting  points,  stopping 
conditions,  etc.  (see  below).  Note  that  they  are  applied  after  the  problem  has  been 
scaled.  We  assume  that  all  computations  are  performed  with  about  16  digits  of 
precision. 

We  do  not  perform  any  preprocessing  of  the  data  other  than  scaling.  For  exam¬ 
ple,  we  do  not  attempt  to  discard  any  rows  or  columns  of  A.  (Nor  do  we  attempt  to 
fix  variables  on  their  bounds  as  the  iterates  converge.)  In  practice,  of  course,  prepro¬ 
cessing  can  be  very  successful  in  reducing  the  problem  dimensions  and  improving  the 
numerical  performance  of  solution  algorithms.  Our  aim  is  to  deal  directly  with  the 
problem  data  and  achieve  reliability  in  the  presence  of  redundant  constraints,  null 
variables,  etc.  (since  preprocessors  are  not  guaranteed  to  eliminate  such  difficulties). 

We  make  an  exception  with  regard  to  scaling,  since  we  wish  to  solve  an  entire  set 
of  test  problems  with  a  single  set  of  run-time  parameters.  Without  scaling,  exces¬ 
sively  cautious  parameter  values  may  be  needed  to  achieve  reasonable  performance. 

6.1.  Scaling 

Row  and  column  scales  are  first  determined  by  an  iterative  procedure  that  tends 
to  make  th  •  elements  of  A  close  to  one  [Fou82j.  An  “effective  right-hand  side”  v  is 
then  defined  according  to 

v  =  b  -  53  aJl3  -  1L  ail3  -  £  ftr uj )  (6.1) 

/j=u;  lj>0  u;<0 

and  the  row  scales  are  applied  to  v  to  obtain  the  quantities  v  and  a  =  ||u||.  If  a  >  1, 
all  row  and  column  scales  are  multiplied  by  a. 

This  is  the  scaling  procedure  used  in  MINOS  [MS87].  In  most  cases  it  has  the 
effect  of  making  ||x*||  «  1,  where  x‘  is  the  solution  of  the  scaled  problem.  As 
noted  elsewhere  [GMSW89,Marx89],  the  test  problems  growl,  growl5  and  grow22 
are  exceptions  in  that  ||x*||  »  107.  To  assist  such  cases  we  have  implemented  an 
additional  scaling  that  takes  effect  if  v  =  0  in  (6.1). 

The  grow  problems  happen  to  be  of  the  form 

min  cTx  subject  to  Ax  =  0,  0  <  x  <  u. 

We  first  note  that  the  optimal  solution  satisfies  cTx*  <  0  (since  x  =  0  is  a  feasible 
point).  Assuming  x*  ^  0,  we  can  then  say  that  x*  =  u}  for  at  least  one  j  (since  if 
some  feasible  point  satisfies  0  <  x  <  u,  the  point  px  is  also  feasible  for  some  p  >  1 
and  it  has  an  improved  objective  value). 

It  follows  that  if  r  =  min^  u,  >  1,  a  smaller  ||i*||  will  result  if  all  the  scales  are 
multiplied  by  r. 


6.  Implementation  Details 
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For  the  grow  problems  we  found  that  r  as  3000,  and  the  additional  scaling  by 
r  reduced  ||z*||  from  107  to  104  and  improved  the  reliability  of  the  solves  with  K. 
The  effective  right-hand  side  v  was  zero  for  one  other  problem  ( sc205 ).  In  this  case, 
r  =  100  and  the  extra  scaling  reduced  the  iteration  count  from  15  to  11. 

6.2.  Scaling  c 

Once  scale  factors  are  obtained  as  above,  they  are  applied  to  the  data  .4,  b ,  c,  /,  u. 
A  further  scale  factor  is  then  applied  to  c  to  make  ||c||  «  l.3  In  most  cases  the  effect 
is  to  make  ||7f*|j  ss  1  for  the  scaled  problem. 

6.3.  Starting 

The  initial  values  for  the  scaled  primal  and  dual  variables  were  chosen  as  follows 
(with  £0  =  Co  =  1)-  Recall  that  the  initial  values  of  x  and  Tt  do  not  affect  the 
subsequent  iterations  (except  that  x  is  used  to  define  the  initial  si  and  S2). 

•  Xj  =  0  if  zero  lies  between  the  bounds;  otherwise,  xj  =  lj  or  ij,  whichever 
bound  is  nearest  zero. 

•  (si)j  =  max(f0,zj  -  lj );  (s2)j  =  ma x(£0,Uj  -  x}). 

•  7T  =  0. 

•  Vj  =  Zj  =  Co- 

The  initial  value  of  p.  was  set  to  balance  the  paits  of  the  residual  vector  in  (2.2 )  that 
do  and  do  not  depend  on  p  (with  po  =  0.1): 

•  /i  =  na\\f0\\2/y/nbound, 

where  nbound  (<  2n)  is  the  number  of  finite  upper  and  lower  bounds. 

6.4.  Stopping 

If  (x,si,S2)  and  (n,z,y)  are  primal  and  dual  feasible  respectively,  the  duality  gap 
(the  difference  between  the  primal  and  dual  objectives)  is 

sjz  +  s^y  -  (cTx  +  \xtQx  +  \pTp)  -  (bT7r  +  lTz  -  uTy  -  \xTQx  -  \pTp), 

where  Q  =  Q  +  72/.  The  stopping  criterion  for  LP  problems  required  the  following, 
with  6fea  =  Sopt  =  10_d  meaning  a  request  for  d  digits  of  accuracy  {d  =  6  or  8): 

•  IMk/(l  +  11*11) 

»  W(1  +  M)  <  w 

3The  Euclidean  norm  || .  jfe  is  required  for  terms  in  the  merit  function  ||/(1 1|2.  For  other  vectors 
v  of  length  n  we  define  ||v||  =  ^|u;|/\/«  (which  approximates  || vjfe  but  is  cheaper  to  evaluate). 
Since  c  may  be  sparse,  we  scale  it  by  £  lc;l/\/^c.  where  nc  is  the  number  of  nonzeros  in  c.  Using 
nc  in  place  of  n  affected  the  iteration  counts  for  many  of  the  test  problems — on  average  favorably 
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•  ( sfz  +  s$y)/(l  +  |cTx|)  <  £opt. 

After  each  iteration,  a  minimum  value  of  (x  is  defined  in  terms  of  the  objective 
function  and  the  optimality  tolerance.  In  the  LP  case, 

•  /bnin  =  (1  +  |cTx|)6Opt/(10n&otmc<). 

If  the  current  (i  is  below  2/inun,  then  n  is  not  reduced  for  the  next  iteration. 

6.5.  Regularization 

The  values  7  =  10-5  and  6  =  10-5  were  used,  with  seemingly  satisfactory  results. 
Larger  values  may  perturb  the  solution  too  much,  and  smaller  values  can  lead  to 
near-singularity  in  I(  and  perhaps  divergence  of  the  iterates. 

Recall  that  the  regularization  terms  in  the  objective  are  ^||7a:||2  -f  ^(|^7r||2,  with 
||:c||  ss  1  and  j|7r||  s»  1  near  a  solution.  For  many  problems  we  have  observed  that 
II^H  decreases  sharply  in  the  final  primal-dual  iterations,  showing  that  a  nonzero  S 
helps  resolve  some  ambiguity  in  the  dual  solution. 

In  fine  with  the  theory  of  [MM79,Man84],  there  is  no  similar  decrease  in  ||aj| 
when  7  is  rather  small. 

6.6.  Solving  the  KKT  Systems 

To  date  we  have  used  the  Harwell  Subroutine  Library  package  MA27  [DR82.DRS3] 
to  solve  the  reduced  KKT  systems.  This  is  a  multifrontal  code  designed  to  perform 
well  on  vector  machines  on  matrices  that  are  definite  or  nearly  definite  (i.e.,  most 
eigenvalues  have  the  same  sign). 

When  there  are  no  free  variables  or  dense  columns,  the  tolerances  are  such  that 
Kb  =  -AII~1At  for  most  of  the  early  iterations.  The  performance  should  then  be 
similar-  to  other  Cholesky-based  implementations. 

As  the  optimal  solution  is  approached,  many  diagonals  of  II  become  small  and 
Kb  becomes  more  indefinite  as  its  dimension  increases.  In  some  cases  the  MA27 
Factor  generates  substantially  more  nonzeros  than  predicted  by  the  Analyze,  and 
the  iteration  time  deteriorates  significantly. 

As  always,  improvements  in  computation  time  will  come  from  speeding  up  the 
KKT  solves.  A  major  modification  of  MA27  is  being  developed  as  MA47  [DGRST89], 
and  we  expect  that  its  performance  in  the  KKT  context  will  be  considerably  im¬ 
proved.  A  promising  alternative  is  the  code  described  recently  in  [FM91]. 


7.  Numerical  Results 


19 


7.  Numerical  Results 

In  this  section  we  present  results  obtained  from  the  netlib  collection  of  LP  test 
problems  [Gay85].  One  aim  is  to  explore  the  probable  dimensions  of  the  reduced 
KKT  systems  that  must  be  solved  (and  determine  how  often  a  new  Analyze  is 
needed). 

Another  aim  is  to  show  that  a  primal-dual  barrier  algorithm  similar  to  the  ones  in 
[Meh89,Meh90,LMS89,LMS90]  can  achieve  comparably  low  iteration  counts  without 
the  benefit  of  preprocessing  (other  than  scaling)  and  with  relatively  simple  starting 
conditions  and  a  straightforward  reduction  of  p  each  iteration.  This  is  partly  due 
to  the  improved  reliability  of  the  numerical  linear  algebra  in  the  presence  of  free 
variables,  dense  columns,  and  near-singularity. 

During  the  code  development,  occasional  high  iteration  counts  were  usually 
found  to  be  the  result  of  lax  tolerances  in  forming  and  solving  the  reduced  KKT 
systems  (just  as  a  simplex  code  could  be  expected  to  iterate  indefinitely  if  an  un¬ 
reliable  basis  package  were  used).  With  the  current  MA27  factorizer,  it  remains 
desirable  to  use  lax  tolerances  tentatively  (to  enhance  sparsity),  since  they  are  of¬ 
ten  adequate.  Provision  for  iterative  refinement  and  tightening  of  the  factorization 
tolerances  (Section  4.1)  seems  to  provide  a  reliable  safeguard. 

As  an  example,  most  of  the  test  problems  solved  successfully  with  the  tolerances 
fixed  at  Htol  —  10-8  and  factol  =  0.001.  This  is  extremely  lax  in  terms  of  con¬ 
ventional  Gaussian  elimination,  but  note  that  implementations  based  on  All~lAr 
are  effectively  using  Htol  =  0  and  factol  =  0,  with  no  increase  possible.  Iterative 
refinement  can  again  be  invoked,  but  that  alone  may  be  unsuccessful. 


7.1.  Dimension  of  the  Reduced  KKT  Systems 

Let  Tit  be  the  dimension  of  the  reduced  KKT  system  I\B  (3.3)  at  iteration  I:.,  am! 
let  r*  =  nit/m.  The  first  graph  in  Figure  1  plots  the  ratios  r/.  for  a  representative 
selection  of  problems  (using  Htol  =  10-6  and  requesting  8  digits  of  accuracy).  The 
name  of  each  problem  appears  near  the  end  of  the  associated  plot. 

The  value  r*  =  1  implies  that  1\B  =  All~'AT  at  iteration  k.  For  example, 
problems  scsdS  and  shipl2l  both  give  fully  reduced  systems  at  the  beginning,  since 
all  elements  of  H  are  of  order  1  initially  md  there  are  no  dense  columns.  In  contrast, 
j-fc  «  2  for  most  of  the  pilots  iterations,  because  almost  m  columns  of  A  contain  10 
or  more  nonzeros  and  are  included  in  B  throughout. 

In  general,  r*,.  stays  almost  constant  until  the  final  iterations,  when  many  diago¬ 
nals  of  H  start  falling  below  Htol.  The  dimension  of  KB  increases  as  more  columns 
of  A  are  included  in  B. 

Similarly,  let  nzf.  be  the  number  of  nonzeros  in  the  MA27  factorization  of  I\B  at 
iteration  k ,  and  let  f*.  =  nzk/nz\.  (The  minimum  number  of  nonzeros  happens  to 
occur  at  the  first  iteration.)  The  second  graph  in  Figure  1  plots  the  ratios  f\.  for 
the  same  problems.  The  values  fk  >  2  represent  a  serious  loss  of  sparsity  in  order 
to  preserve  stability. 

Some  observations  follow. 
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Figure  1:  The  dimension  of  the  reduced  KKT  systems  KB  (relative  to  m),  and  the 
number  of  nonzeros  in  the  MA27  factors  (relative  to  the  first  factorization). 


•  The  dimension  of  I(B  changes  from  its  previous  value  rather  more  than  half 
of  the  time.  This  determines  how  many  times  a  new  Analyze  is  needed.  Some 
statistics  are  given  in  Table  1. 

•  The  cpu  time  for  an  MA27  Analyze  is  usually  moderate  compared  to  a  Factor , 
but  sometimes  it  can  be  substantial.  Table  1  shows  how  much  time  is  spent 
in  each  phase,  as  a  percentage  of  the  total  cpu  time.  (There  is  normally  one 
Factor  per  iteration,  except  on  rare  occasions  when  the  stability  tolerances  are 
tightened.) 

•  There  is  typically  a  sharp  increase  in  the  MA27  Factor  nonzeros  during  the 
final  iterations.  In  particular,  requesting  8  digits  of  accuracy  rather  than  6 
carries  a  substantial  cost. 

These  matters  reflect  the  cost  of  stability  compared  to  implementations  based  on 
A1I~xAt  (for  which  r/t  =  ft  =  1  throughout). 


7.  Numerical  Results 


SI 


Analyze 

Factor 

Analyze  time 

Factor  time 

scsd6 

11 

12 

20% 

32% 

shipl2l 

14 

19 

18% 

30% 

25fvf7 

14 

23 

13% 

72% 

pilotja 

19 

32 

16% 

75% 

pilots 

20 

35 

15% 

81% 

80bau3b 

38 

42 

29% 

45% 

degenS 

9 

28 

29% 

66% 

Table  1:  The  number  of  MA27  Analyze  and  Factor  calls,  and  the  percentage  of  time 
spent  in  each. 

7.2.  Performance  on  the  netlib  Test  Set 

Table  2  lists  the  iteration  counts  for  PDQ1  in  solving  the  first  70  LP  test  problems 
in  netlib.  We  give  results  for  both  6  and  8-digit  accuracy.  They  are  compared 
to  the  iteration  counts  recorded  by  OBI  in  [LMS90],  where  8-digit  accuracy  was 
also  requested  (and  in  most  cases  obtained).  OBI  used  some  preprocessing  of  the 
data  [LMS89].  The  column  labeled  “DifT  indicates  significant  differences  between 
columns  1  and  3.  A  +  means  PDQ1  required  more  iterations  than  OBI. 

Some  observations  follow. 

•  The  stability  tolerances  Htol  =  10-6  and  factol  =  0.01  proved  to  be  reliable 
in  almost  all  cases.  The  grow  problems,  scsdS ,  pilotja  and  pilots  required 
iterative  refinement  at  certain  points  and  ultimately  made  one  request  for 
stricter  tolerances  ( Htol  =  10-4  and  factol  =  0.1). 

•  The  grow  problems  illustrate  the  effect  of  inaccurate  solution  of  the  KKT  sys¬ 
tem.  For  the  last  few  iterations  there  were  symptoms  of  difficulty  (refinement, 
reversion  to  pure  primal-dual,  backtracking  in  the  linesearch)  and  stricter  tol¬ 
erances  were  finally  requested  for  the  last  iteration.  When  Iltol  =  10-6  and 
factol  =  0.1  were  used  from  the  beginning,  no  symptoms  of  numerical  error 
arose  and  the  iteration  counts  were  19,  21  and  21  (a  significant  improvement 
for  the  last  two  cases,  though  still  more  than  achieved  by  OBI). 

•  It  is  not  clear  why  significantly  more  iterations  were  required  for  cycle ,  wood lp 
and  woodw.  These  problems  are  from  the  same  source  and  probably  have  a 
distinguishing  characteristic  that  would  explain  the  OBI  advantage. 

•  The  first  three  pilot  problems  solved  in  significantly  fewer  iterations  than  with 
OBI.  About  80  free  variables  and  some  notoriously  narrow  bounds  ( l}  =  0, 
Uj  =  10~5)  do  not  appear  to  have  caused  difficulty. 

•  Otherwise,  we  see  that  the  iteration  counts  for  OBI  and  PDQ1  are  comparable 
for  most  problems. 
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Code: 

Digits: 

OBI 

8 

PDQl 

6 

PDQl 

8 

Diff 

Code: 

Digits: 

OBI 

8 

PDQl 

6 

PDQl 

8 

Diff 

25fv47 

25 

23 

23 

recipe 

10 

13 

15 

+ 

80bau3b 

38 

37 

42 

sc205 

11 

11 

13 

adlittlc 

12 

12 

13 

scagr25 

16 

16 

17 

afiro 

9 

9 

10 

scagr7 

12 

13 

14 

agg 

24 

19 

21 

scfxml 

17 

17 

18 

agg2 

18 

19 

21 

scfxm2 

19 

19 

20 

aggS 

17 

19 

21 

scfxmS 

20 

19 

21 

bandm 

17 

16 

18 

scorpion 

14 

12 

14 

beaconfd 

10 

18 

21 

+4* 

scrs8 

27 

20 

22 

« 

boreSd 

18 

21 

23 

+ 

scsdl 

11 

9 

9 

brandy 

19 

17 

19 

scsd6 

12 

11 

12 

capri 

18 

19 

20 

scsd8 

10 

11 

15 

+ 

cycle 

30 

37 

43 

+++ 

sctapl 

15 

14 

15 

czprob 

35 

33 

36 

sctap2 

20 

15 

15 

- 

degen2 

14 

15 

17 

sctapS 

17 

16 

16 

degenS 

20 

26 

28 

+ 

seba 

19 

18 

20 

e226 

22 

18 

20 

sharelb 

20 

17 

18 

etamacro 

29 

19 

23 

- 

share2b 

12 

11 

12 

ffffloo 

28 

23 

27 

shell 

21 

20 

21 

forplan 

0. 

23 

25 

ship04l 

15 

17 

18 

gauges 

16 

16 

18 

ship04s 

15 

17 

18 

gfrdpnc 

18 

17 

18 

ship08l 

16 

16 

18 

greenbea 

41 

48 

51 

++ 

ship08s 

14 

16 

17 

greenbeb 

33 

34 

38 

+ 

shipl2l 

18 

17 

19 

grow7 

14 

17 

19 

ship!  2s 

18 

17 

18 

growlS 

16 

18 

26 

++ 

sierra 

18 

18 

20 

grow22 

16 

18 

30 

++ 

stair 

16 

17 

17 

israel 

23 

22 

23 

standata 

15 

17 

19 

kb2 

15 

13 

13 

slandmps 

24 

24 

25 

nesm 

30 

23 

30 

stocforl 

19 

13 

13 

- 

pilotja 

46 

28 

31 

— 

stocJor2 

22 

33 

33 

+  + 

pilotwe 

46 

30 

33 

— 

tuff 

19 

24 

27 

+ 

pilot  4 

36 

25 

27 

— 

vtpbase 

13 

15 

18 

+ 

pilolnov 

20 

16 

17 

woodlp 

14 

27 

29 

+++ 

pilots 

29 

31 

34 

+ 

woodw 

20 

26 

30 

+  + 

Table  2:  Iteration  counts  for  OBI  (requesting  8  digits)  and  PDQl  (6  and  8  digits). 


•  In  terms  of  cpu  time,  the  results  would  be  far  from  comparable.  We  defer  such 
statistics  until  a  more  efficient  indefinite  solver  is  installed. 


8.  Conclusions 
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8.  Conclusions 

For  various  good  reasons,  most  interior-point  codes  for  LP  (and  for  separable  QP) 
have  been  based  on  Cholesky  factors  of  matrices  of  the  form  AH~lA iT.  Excellent  per¬ 
formance  has  been  achieved  (notably  by  [LMS90,Meh90,CLMS90])  and  the  primary 
sources  of  difficulty  have  been  thought  to  be  dense  columns  and  free  variables. 

For  general  QP  problems  a  full  KKT  system  must  be  solved  [Pon90],  and  it  is  be¬ 
coming  increasingly  recognized  that  such  an  approach  removes  the  above-mentioned 
difficulties  (e.g.  [Tur90,GMPS90,FM91,Van91]). 

Here  we  have  emphasized  the  fact  that  the  real  source  of  numerical  error  lies  in 
pivoting  on  small  diagonals  of  H  in  the  presence  of  primal  degeneracy.  Reducing 
a  KKT  system  K  to  is  equivalent  to  pivoting  on  all  diagonals  of  //,  re¬ 

gardless  of  size.  We  suggest  forming  “reduced  KKT  systems”  by  pivoting  on  just 
the  diagonals  of  H  that  are  suitably  large.  This  allows  us  to  avoid  factorizing  a  full 
KKT  system,  and  often  leads  to  use  of  AII~lAT  h\  the  early  iterations  when  it  is 
numerically  safe. 

The  primary  advantage  is  intended  to  be  numerical  reliability.  The  drawbacks 
are  that  a  new  Analyze  is  required  each  time  the  reduced  KKT  system  changes  in 
dimension,  and  that  we  are  dependent  on  the  efficiency  of  a  symmetric  indefinite 
factorizer.  Some  new  codes  [DGRST89,FM91]  promise  to  narrow  the  gap  between 
indefinite  and  definite  solvers.  A  variant  of  “reduced  KKT  systems”  has  recently 
been  given  in  [Van91].  An  advantage  is  that  it  avoids  the  need  for  an  indefinite 
solver,  but  in  its  present  form  it  is  susceptible  to  the  normal  dangers  of  small  pivots 
in  II. 

In  terms  of  overall  strategy,  it  remains  to  be  seen  which  of  the  approaches  in 
[LMS90,Tur90,FM91,Van91]  and  the  present  paper  will  offer  the  most  favorable 
balance  between  efficiency  and  reliability. 
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